首页> 外文OA文献 >Prosody-based automatic segmentation of speech into sentences and topics
【2h】

Prosody-based automatic segmentation of speech into sentences and topics

机译:基于韵律的语音自动分割成句子和主题

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models-for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.
机译:处理语音音频数据以进行信息提取,主题检测或浏览/播放的关键步骤是将输入分成句子和主题单元。语音分割具有挑战性,因为口头语言中通常没有用于分割文本的提示(标题,段落,标点符号)。我们研究了韵律(从语音的时间和旋律中收集的信息)在这些任务中的使用。使用决策树和隐式马尔可夫建模技术,我们将韵律线索与基于单词的方法结合起来,并评估两个语音语料库(广播新闻和总机)的性能。结果表明,对于新闻语音中的真实单词和自动识别单词,仅韵律模型的性能与基于单词的统计语言模型相同,甚至更好。韵律模型通过相当少的训练数据即可达到可比的性能,并且不需要人工标记韵律事件。在所有任务和语料库中,我们使用韵律和词汇信息的概率组合,对仅单词的模型进行了重大改进。检查发现韵律模型捕获了文献中描述的与语言无关的边界指示符。最后,提示用法取决于任务和语料库。例如,停顿和音调功能对于分割新闻语音很有帮助,而停顿,持续时间和基于单词的提示则是自然对话的主导。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号